Textile weaving dataset for machine learning to predict rejection and production of a weaving factory

Weaving is one of the most popular fabric manufacturing techniques. The weaving process consists of 3 major stages: warping, sizing, and weaving. The weaving factory henceforth involves a lot of data. But unfortunately, there is no attempt to utilize machine learning or data science in weaving production. Although a variety of scopes are there to implement statistical analysis, data science, and machine learning. The dataset was prepared by using the daily production report for 9 months. The final dataset contains 121,148 data with 18 parameters. Whereas the raw data contains the same number of entries with 22 columns. The raw data needs substantial work to combine the daily production report, treat the missing values, rename columns, and feature engineering to derive EPI, PPI, warp, weft count values, etc. The complete dataset is stored at https://data.mendeley.com/datasets/nxb4shgs9h/1. It is further processed to get the rejection dataset which is stored at https://data.mendeley.com/datasets/6mwgj7tms3/2. The future implementation of the dataset is to predict the weaving waste, investigate the statistical relations among various parameters, production prediction, etc.

Textile Engineering Specific subject area Weaving (Woven fabric is a major fabric type. The woven fabrics are produced by weaving factories) Type of data Table  How the data were acquired A weaving industry in Bangladesh named Evince Textiles Ltd. has been chosen considering diversified product manufacturing from various yarn counts. First, the daily production report from January to September (09 months) has been collected. Then the daily production reports were merged with the Pandas library and finally preprocess to get the final dataset. On average, each day's production report shows about 270 rows and 22 columns. Each day's production and rejection amount is a combination of 3 shifts (1 shift = 8 hours), Hence, the production report has been prepared by combining the 3 shifts. An officer collected the data from the batch card attached to the machine at the beginning of each shift's production. He also recorded the production quantity of each shift from the automatic LED display unit of the automatic loom. The daily production report is distributed to the managers and officers of the factory for an overall idea about the production status of different orders. Data format Raw analyzed Filtered Description of data collection The raw data (production record) was collected from Evince Textiles Ltd. from January 2013 to September 2013. The data were then merged, preprocessed, filtered, and feature-engineered to obtain the final dataset in CSV format. Data source location Value of the data • The textile industry has a lot of data, often the factory personnel look at the data and assume or predict something based on their experience. But if statistical tests were employed here then the prediction or assumption would be very accurate and effective. • This dataset intends to build an algorithm to predict the weaving waste from some important clothing parameters such as yarn count, ends and picks per inch, and required quantity. Hence, the production manager may forcast the rejection amount of future woven fabric production. • The presented dataset also helps to predict woven fabric production.
• The dataset can also find out the correlations among weaving production, yarn parameters, fabric rejections, etc.

Objective
Textile industries involve huge data due to the long interdependent processes. But there is very limited work on the implementation of machine learning in predicting or classifying fabric faults. Moreover, currently, the total production is estimated empirically or through machine speed. However, the rejection and production both depend on multiple factors such as yarn count, ends per inch (EPI), picks per inch (PPI), order length, etc. This dataset tends to facilitate the rejection or production prediction of a weaving industry.

Data Description
Woven fabric is one of the most commonly used fabric types. It is associated with a long process including warping, sizing, and weaving [1] . For weaving, typically modern air jet or rapier looms are used. An overview of the dataset entries is depicted in Table 1 . The weaving management information department of the factory prepared daily production reports, which means each day has one production report. In this way for the month of January 31 production reports were available, for February it is 28, and so on. Each day's production report contains the date, order id, fabric construction, loom id (serial number of the used loom ), and details information about the yarn and fabric specifications. Each day on average 270 entries were recorded depending on the order quantity, loom stoppage due to mechanical and electrical problems, beam loading and unloading, and other problems. In this way, the total number of entries per month is also shown in Table 1 .
But the production report ( Table 2 ) is not useable for data analysis as it contains a lot of missing values (as the data entry officers intentionally keep the rows blanks to indicate the previous records). Besides, they used 3 rows as headers and many unnecessary data such as loom number, today's delivery, previous delivery, and total delivery which are irrelevant for machine learning and statistical analysis. Table 1 Total entries in the dataset with total daily production report.

Month
Total daily production report Total entry in files

Experimental Design, Materials and Methods
The data was collected from the rapier loom (Leonardo, SOMET, and SMIT Brands, Italy) of Evince Textiles Ltd. These looms were fully automated and can keep records of all data related to production. The dataset used 272 days of daily production reports of 172 looms. The total entry of the production data (raw) was 121,148 with 22 columns.
The production report is not made for data analysis but to keep a record of the production. Hence, it contains a lot of unnecessary information, missing values, typing mistakes, and so on. As a result, we used different python libraries to preprocess it. Here, the Pandas data frame [4] was used as the main tool. The raw data and preparation code for this data has been uploaded to GitHub [5] .

Preprocessing
First, the daily production reports (272 files for 272 days) were combined in a folder. Then all the files were merged into a single file using the Pandas library. The raw data [5] has some acronym columns that were renamed with meaningful ones. The primary dataset contains a lot of missing values, those were kept intentionally to imply the previous records. Hence, we filled the values with the previous ones. Then, some features engineering such as the required grey fabrics' length and beam length of the required grey fabrics were done. Again, the construction columns contain four very important pieces of information such as ends per inch (epi), picks per inch (ppi), warp count, and weft count. The information was split into 4 columns. Finally, the final dataset was achieved having 18 columns. From this dataset, two datasets were created one is the full weaving dataset [6] and another is for the rejection dataset [7] . The rejection dataset contains only the important columns and rows (22,010 rows and 14 columns) whereas the full dataset contains all information (121,148 rows and 18 columns). An example of the full dataset and rejection dataset is provided in Table 3 and Table 4 , respectively. The full data set contains some Null values. These are for the special supplementary production where extra fabrics were needed to be produced but later due to the order fulfillment the looms remained idle, i,e there was no production but it was considered in the production dataset.
Describing the parameters:

Ethics Statement
The data of this article involve neither animal nor human participants. Besides, according to the company's data distribution policy data can be shared for research and non-commercial purposes. Hence our dataset complies with the data distribution policy of the company.  Table 4 Rejection dataset (first 5 rows).

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.